摘要 :
Image conversion has attracted mounting attention due to its practical applications. This paper proposes a lightweight network structure that can implement unpaired training sets to complete one-way image mapping, based on the gen...
展开
Image conversion has attracted mounting attention due to its practical applications. This paper proposes a lightweight network structure that can implement unpaired training sets to complete one-way image mapping, based on the generative adversarial network (GAN) and a fixed-parameter edge detection convolution kernel. Compared with the cycle consistent adversarial network (CycleGAN), the proposed network features simpler structure, fewer parameters (only 37.48% of the parameters in CycleGAN), and less training cost (only 35.47% of the GPU memory usage and 17.67% of the single iteration time in CycleGAN). Remarkably, the cyclic consistency becomes not mandatory for ensuring the consistency of the content before and after image mapping. This network has achieved significant processing effects in some image translation tasks, and its effectiveness and validity have been well demonstrated through typical experiments. In the quantitative classification evaluation based on VGG-16, the algorithm proposed in this paper has achieved superior performance.
收起
摘要 :
The majority of image-to-image translation models tend to struggle in varying domain settings. For one varying domain, samples vary significantly in shape and size and have no domain labels. This paper pro-poses an unsupervised cl...
展开
The majority of image-to-image translation models tend to struggle in varying domain settings. For one varying domain, samples vary significantly in shape and size and have no domain labels. This paper pro-poses an unsupervised class-to-class translation model based on conditional contrastive learning to tackle the domain variations problem. The initial hypothesis is that the latent modalities of two varying domains are categorizable by style differences of different samples and turn the image-to-image translation prob-lem into class-to-class translation. Firstly, unsupervised semantic clustering is performed for each domain to divide them into multiple classes and then leverage the classification features of different classes to perform class-to-class translation. Two conditional contrastive learning loss functions for each domain are proposed to perform unsupervised semantic clustering and decompose it into multiple classes. Then in the class-to-class translation stage, the classification features of different classes are employed to learn the latent modalities. The proposed model outperforms state-of-the-art baseline methods by employing the latent modalities of different classes. The sample code is available at https://github.com/c1a1o1/ucct .(c) 2023 Elsevier Ltd. All rights reserved.
收起
摘要 :
Recently, image-to-image translation has been made much progress owing to the success of conditional Generative Adversarial Networks (cGANs). And some unpaired methods based on cycle consistency loss such as DualGAN, CycleGAN and ...
展开
Recently, image-to-image translation has been made much progress owing to the success of conditional Generative Adversarial Networks (cGANs). And some unpaired methods based on cycle consistency loss such as DualGAN, CycleGAN and DiscoGAN are really popular. However, it's still very challenging for translation tasks with the requirement of high-level visual information conversion, such as photo-to-caricature translation that requires satire, exaggeration, lifelikeness and artistry. We present an approach for learning to translate faces in the wild from the source photo domain to the target caricature domain with different styles, which can also be used for other high-level image-to-image translation tasks. In order to capture global structure with local statistics while translation, we design a dual pathway model with one coarse discriminator and one fine discriminator. For generator, we provide one extra perceptual loss in association with adversarial loss and cycle consistency loss to achieve representation learning for two different domains. Also the style can be learned by the auxiliary noise input. Experiments on photo-to-caricature translation of faces in the wild show considerable performance gain of our proposed method over state-of-the-art translation methods as well as its potential real applications. Our code is available at https://github.com/zhengziqiang/P2C. (C) 2019 Elsevier B.V. All rights reserved.
收起
摘要 :
Photo-to-caricature translation is an extremely challenging task because there are not only texture differences between caricatures and photos, but also various spatial deformations in caricatures. Most of existing methods tend to...
展开
Photo-to-caricature translation is an extremely challenging task because there are not only texture differences between caricatures and photos, but also various spatial deformations in caricatures. Most of existing methods tend to introduce difficult obtained additional information such as precise facial landmarks to guide caricature generation. In addition, identity preservation is a crucial characteristic of caricatures, but unfortunately there seems to be few methods to consider it. Motivated by the aforementioned observations, we propose an Identity-Preservation Generative Adversarial Network (IPGAN) for unsupervised photo-to-caricature translation. In particular, considering the importance of identity retention, we propose a novel identity preservation loss to hold the identity information of original photos and improve the quality of generated caricatures. To capture realistic caricature styles, we design a style differentiation loss to help our model produce caricatures with styles that remarkably differ from photos. Moreover, to learn satisfactory deformations without supervision, our model uses a warp controller to acquire exaggerations automatically that enable to customize diverse exaggerations. As an unsupervised translation method, our IPGAN can also be applied to caricature to-photo translation. Experiments on the WebCaricature dataset suggest that our IPGAN achieves state-of-the-art performance and can generate realistic as well as identity preservation caricatures. (C)& nbsp;2022 Elsevier B.V. All rights reserved.
收起
摘要 :
Abstract The creation of an image from another and from different types of data including text, scene graph, and object layout, is one of the very challenging tasks in computer vision. In addition, capturing images from different ...
展开
Abstract The creation of an image from another and from different types of data including text, scene graph, and object layout, is one of the very challenging tasks in computer vision. In addition, capturing images from different views for generating an object or a product can be exhaustive and expansive to do manually. Now, using deep learning and artificial intelligence techniques, the generation of new images from different type of data has become possible. For that, a significant effort has been devoted recently to develop image generation strategies with a great achievement. To that end, we present in this paper, to the best of the authors’ knowledge, the first comprehensive overview of existing image generation methods. Accordingly, a description of each image generation technique is performed based on the nature of the adopted algorithms, type of data used, and main objective. Moreover, each image generation category is discussed by presenting the proposed approaches. In addition, a presentation of existing image generation datasets is given. The evaluation metrics that are suitable for each image generation category are discussed and a comparison of the performance of existing solutions is provided to better inform the state-of-the-art and identify their limitations and strengths. Lastly, the current challenges that are facing this subject are presented.
收起
摘要 :
Generative Adversarial Networks (GANs) have been extremely successful in various application domains such as computer vision, medicine, and natural language processing. Moreover, transforming an object or person to a desired shape...
展开
Generative Adversarial Networks (GANs) have been extremely successful in various application domains such as computer vision, medicine, and natural language processing. Moreover, transforming an object or person to a desired shape become a well-studied research in the GANs. GANs are powerful models for learning complex distributions to synthesize semantically meaningful samples. However, there is a lack of comprehensive review in this field, especially lack of a collection of GANs loss-variant, evaluation metrics, remedies for diverse image generation, and stable training. Given the current fast GANs development, in this survey, we provide a comprehensive review of adversarial models for image synthesis. We summarize the synthetic image generation methods, and discuss the categories including image-to-image translation, fusion image generation, label-to image mapping, and text-to-image translation. We organize the literature based on their base models, developed ideas related to architectures, constraints, loss functions, evaluation metrics, and training datasets. We present milestones of adversarial models, review an extensive selection of previous works in various categories, and present insights on the development route from the model-based to data-driven methods. Further, we highlight a range of potential future research directions. One of the unique features of this review is that all software implementations of these GAN methods and datasets have been collected and made available in one place at https://github.com/pshams55/GAN-Case-Study.
收起
摘要 :
While image-to-image translation has been extensively studied, there are a number of limitations in existing methods designed for transformation between instances of different shapes from different domains. In this paper, a novel ...
展开
While image-to-image translation has been extensively studied, there are a number of limitations in existing methods designed for transformation between instances of different shapes from different domains. In this paper, a novel approach was proposed (hereafter referred to as ObjectVariedGAN) to handle geometric translation. One may encounter large and significant shape changes during image-to-image translation, especially object transfiguration. Thus, we focus on synthesizing the desired results to maintain the shape of the foreground object without requiring paired training data. Specifically, our proposed approach learns the mapping between source domains and target domains, where the shapes of objects differ significantly. Feature similarity loss is introduced to encourage generative adversarial networks (GANs) to obtain the structure attribute of objects (e.g., object segmentation masks). Additionally, to satisfy the requirement of utilizing unaligned datasets, cycle-consistency loss is combined with context-preserving loss. Our approach feeds the generator with source image(s), incorporated with the instance segmentation mask, and guides the network to generate the desired target domain output. To verify the effectiveness of proposed approach, extensive experiments are conducted on pre-processed examples from the MS-COCO datasets. A comparative summary of the findings demonstrates that ObjectVariedGAN outperforms other competing approaches, in the terms of Inception Score, Frechet Inception Distance, and human cognitive preference.
收起
摘要 :
The paradigm of image-to-image translation is leveraged for the benefit
of sketch stylization via transfer of geometric textural details. Lacking the
necessary volumes of data for standard training of translation systems, we
ad...
展开
The paradigm of image-to-image translation is leveraged for the benefit
of sketch stylization via transfer of geometric textural details. Lacking the
necessary volumes of data for standard training of translation systems, we
advocate for operation at the patch level, where a handful of stylized sketches
provide ample mining potential for patches featuring basic geometric primitives.
Operating at the patch level necessitates special consideration of full
sketch translation, as individual translation of patches with no regard to
neighbors is likely to produce visible seams and artifacts at patch borders.
Aligned pairs of styled and plain primitives are combined to form input
hybrids containing styled elements around the border and plain elements
within, and given as input to a seamless translation (ST) generator, whose
output patches are expected to reconstruct the fully styled patch. An adversarial
addition promotes generalization and robustness to diverse geometries
at inference time, forming a simple and effective system for arbitrary sketch
stylization, as demonstrated upon a variety of styles and sketches.
收起
摘要 :
Researchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches f...
展开
Researchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. However, these approaches have inherent limitations. The former is not very efficient, while the latter often struggles to effectively handle multi-attribute changes. To address these weaknesses, we present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes. The core of our method is the use of novel, lightweight text-conditioned adapter layers integrated into pretrained GAN-inversion networks. We demonstrate that by conditioning the initial inversion step on the Contrastive Language-Image Pre-training (CLIP) embedding of the target description, we are able to obtain more successful edit directions. Additionally, we use a CLIP-guided refinement step to make corrections in the resulting residual latent codes, which further improves the alignment with the text prompt. Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds, as shown by our qualitative and quantitative results.
收起